4 research outputs found
Accelerating overlapping community detection: Performance tuning a stochastic gradient Markov chain Monte Carlo algorithm
Building efficient algorithms for data-intensive problems requires deep analysis of data access patterns. Random data access patterns exacerbate this process. In this paper, we discuss accelerating a randomized data-intensive machine learning algorithm using multi-core CPUs and several types of GPUs. A thorough analysis of the algorithm’s data dependencies enabled a 75% reduction in its memory footprint. We created custom compute kernels via code generation to identify the optimal set of data placement and computational optimizations per compute device. An empirical evaluation shows up to 245x speedup compared to an optimized sequential version. Another result from this evaluation is that achieving peak performance does not always match intuition: e.g., depending on the GPU architecture, vectorization may increase or hamper performance
ConPaaS: an Integrated Runtime Environment for Elastic Cloud Applications
Most Cloud applications are re-enactments of traditional enterprise applications such as Web applications, content delivery and e-commerce [1]. The advantages of the Cloud are well-known: access to a near-infinite number of resources